Improved Testing Procedure for Kernel Methods with Time Series Data
نویسندگان
چکیده
This paper develops a robust testing procedure for the nonparametric kernel method in the presence of temporal dependence of unknown forms. We rst propose a new variance estimator that corrects the nite sample bias caused by temporal dependence. The new variance estimator is novel in the sense that it is not only robust to temporal dependence in nite samples but also consistent in large samples, as the e¤ect of the correction vanishes with the sample size. Second, we propose a new reference distribution for hypothesis testing. The conventional reference distribution is based on normal approximation that treats the variance estimate as if it is the true value. We employ the xed smoothing asymptotics to reect the randomness of the variance estimator. The xed smoothing asymptotic distribution is well approximated by a students t-distribution. Finally, we suggest a simulation-based calibration approach to choose smoothing parameters based on a testing oriented criterion. Simulation shows that the proposed procedure works very well. Keywords: HAR estimator, calibration, xed-smoothing asymptotics, kernel estimator, robust inference, t-approximation, testing optimal smoothing parameters choice, temporal dependence JEL Classi cation Number : C12, C14, C22 Email: [email protected] and [email protected]. Any comments and suggestions are appreciated. 1 Introduction This paper proposes a new robust testing procedure for the nonparametric kernel method in the presence of temporal dependence of unknown forms. An important issue in hypothesis testing with time series data is how to take unknown forms of dependence into account to calculate the standard error. Dependence is typical in time series and an estimator with positively dependent data tends to have larger variation than the one with iid data. Therefore, if it is not properly considered, we may have an over-rejection problem. For parametric models, this has been a well-researched problem since Newey and West (1987), and it is now standard practice to use the heteroskedasticity and autocorrelation robust (HAR) standard error in empirical studies. Andrews (1991) establishes its asymptotic properties rigorously. No such a procedure has been proposed for nonparametric kernel methods. This stems from the fact that the distribution of a kernel estimator with dependent data is asymptotically equivalent to the distribution with iid data. See Robinson (1983) for details. This is in sharp contrast to the parametric case. The asymptotic equivalence implies that the usual standard error formulae with iid data are still valid for time series data in the asymptotic sense. However, in nite samples, temporal dependence does a¤ect the sampling distribution of a kernel estimator. In particular, when a process is highly persistent and/or the sample size is not large enough, the asymptotic variance tends to understate the true nite sample variation of a kernel estimator, and this causes the usual asymptotic test to over-reject in nite samples. Conley, Hansen and Liu (1997) and Pritsker (1998) nd this problem in kernel density estimation using short term interest rate models. Our testing procedure improves upon the conventional asymptotic tests in several aspects. First, based on the pre-asymptotic" variance of a kernel estimator, we construct a kernel based HAR variance estimator that corrects the nite sample bias caused by temporal dependence. Using the pre-asymptotic" variance is considered by Chen, Liao and Sun (2014) for sieve inference on time series models. The proposed HAR estimator is novel in the sense that it is not only robust to temporal dependence in nite samples but also consistent in large samples, as the e¤ect of the correction vanishes with the sample size. We study its asymptotic properties rigorously. We nd that the asymptotic bias and variance of the HAR estimator are determined by the two smoothing parameters, the truncation parameter of the HAR estimator ST and the bandwidth parameter of the original kernel estimator hT . In the parametric models, the trade-o¤ between the bias and variance is based on choice of ST . Secondly, we adopt the xed-smoothing asymptotics to characterize the limiting behavior of the HAR estimator and the associated t-statistic. The conventional reference distribution is based on normal approximation that treats the variance estimate as if it is the true value. Under the xed-smoothing asymptotics, the ratio of truncation parameter ST to the sample size T , b is assumed to be xed with the sample size. As the degree of smoothing is xed with the sample size, the HAR estimator converges in distribution to a random variable which is proportional to the true variance. As a result, the t-statistic is asymptotically equivalent to a fraction of standard normal variable to the square root of the random weighting variable. The randomness of the HAR estimator is embedded in this random weighting variable. The asymptotically equivalent distribution is nonstandard, but easy to simulate because it is a function of T iid standard normal variables. We also extend Sun (2014) to approximate the xed smoothing asymptotic distribution by a students t-distribution and establish its validity. Since Kiefer, Vogelsang and Bunzel (2000) and Kiefer and Vogelsang (2002a, 2002b, 2005),
منابع مشابه
Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search
In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...
متن کاملDiscrimination of time series based on kernel method
Classical methods in discrimination such as linear and quadratic do not have good efficiency in the case of nongaussian or nonlinear time series data. In nonparametric kernel discrimination in which the kernel estimators of likelihood functions are used instead of their real values has been shown to have good performance. The misclassification rate of kernel discrimination is usually less than ...
متن کاملSome New Methods for Prediction of Time Series by Wavelets
Extended Abstract. Forecasting is one of the most important purposes of time series analysis. For many years, classical methods were used for this aim. But these methods do not give good performance results for real time series due to non-linearity and non-stationarity of these data sets. On one hand, most of real world time series data display a time-varying second order structure. On th...
متن کاملInterpolating time series based on fuzzy cluster analysis problem
This study proposes the model for interpolating time series to use them to forecast effectively for future. This model is established based on the improved fuzzy clustering analysis problem, which is implemented by the Matlab procedure. The proposed model is illustrated by a data set and tested for many other datasets, especially for 3003 series in M3-Competition data. Comparing to the exist...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کامل